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(54) A method and apparatus for displaying references to a user's document browsing history 
within the context of a new document 



(57) An electronic document reader supplements a 
new document with selectable links that reference por- 
tions ol previously read documents that have content 
similar to a passage of the new document The portions 
of the previously read documents may be previously 
identified and stored in a memory to expedite process- 
ing. The portions of previously read documents are iden- 
tified by annotations or explicitly by a user. The identified 
portions are indexed, clustered and are used as proxies 
for topics. This invention segments a new document into 



passages and matches the passages to the stored top- 
ics based on the similarity of the content. The topics that 
exceed a content similarity threshold cause correspond- 
ing selectable links to be displayed in the display of the 
new document near the corresponding passage. The 
user of this invention can then choose to t oflow the se- 
lectable link to learn more about the topic of the corre- 
sponding segment. In this manner, this invention aids a 
reader in connecting materia) in the new document with 
material in previously read documents. 
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Description 

[0001] This invention relates lo electronic document 
reading systems. More particularly, this invention re- 
lates to electronic document reading systems that sup- 
plement a document with links to rotated portions ot pre- 
viously read and annotated documents. 
[0002] When a person reads documents lor compre- 
hension, the reader establishes relationships between 
the document currently being read and other, previously 
read, documents. These relationships are then used by 
the reader to decide whether the new document should 
be read in detail (lor example, because it complements 
previously read information) or if it should be skipped 
because it only duplicates the previously read docu- 
ments. 

[0003] Conventional systems and methods of con- 
veying document Interrelationships to the reader typi- 
cally rely on graphical visualizations ol documents 
where each document is represented by a point. One 
such system is described in "Bead: Explorations in In- 
formation Visualization*. M. Chalmers et al.. Proceed- 
ings of SIGIR '92, pp. 330-337. ACM Press (1992). in- 
corporated herein by reference in its entirety. Document 
relationships can also be represented as a geometric 
shape. One such system is described in Visualizing Cy- 
berspace: Information Visualization in the Harmony In- 
ternet Browser". K. Andrews. Proceedings of the IEEE 
Information Visualization Symposium *95. pp. 97-104. 
IEEE Press (1995). incorporated herein by reference in 
its entirety While these techniques efficiently represent 
very large numbers of documents, they do not effective- 
ly represent the content of those documents. By sup- 
pressing the docwrienfs content when displaying rela- 
tionships, these systems make it difficult tor a reader to 
understand why and how the documents are related. 
[0004] Remembrance Agent displays a fist of docu- 
ments that are related to the user's current context while 
the user enters text This system Is described in "A Con- 
tinuously Running Automated Information Retrieval 
System*. B. Rhodes et al.. Proceedings of the First In- 
ternational Conference on the Practical Application ot 
Intelligent Agents and Multi Agent Technology (PAAM 
•96), pp. 487-495 (1996). iricorporaled herein by refer- 
ence in its entirety. However. Remembrance Agent does 
not encourage reading or browsing, because the sug- 
gestions are ephemeral and thus disappear when addi- 
tional text is entered. Furthermore, Remembrance 
Agent does not consider the user's interests. No prefer- 
ence is given to passages that the user found interesting 
or relevant over other, potentially irrelevant, portions of 
documents. 

[0005] Bookmark Organizer presents a hierarchical 
list of documents, but leaves it to the user to find the 
appropriate documents. This system Is described in 
•Automatically Organizing Bookmarks Per Contents', Y. 
Maarek et at.. Computer Networks and ISDN Systems, 
28 pp. 1321-1333 (1996). incorporated herein by refer- 



ence in its entirety. In addition, although bookmarks 
identity the potentially interesting documents, they do 
not identity the passages that are of specific interest. 

[0006) There is thus a need for an electronic docu- 
& men! reader that supplements new documents with se- 
lectable links to relevant and annotated portions ol pre- 
viously read documents. 

[0007) This invention provides a system and method 
that automatically constructs relationships among 6eg- 

>0 ments of different documents. Portions of a document 
that have been identified as being interesting to a user 
are extracted from previously read documents. These 
portions may have been explicitly identified to the sys- 
tem by the reader or the relevance of the portion may 

15 be inferred by the system based upon cues, such as an- 
notations, made to the documents by the user. The iden- 
tified portions, or "surrogates', are indexed and finked 
to the original documents and are used as proxies for 
the users various interests. The portions are clustered 

20 based upon their relatedness to each other. Therefore, 
each cluster of portions relates to a topic. 
[0008] When a new document is opened it is seg- 
mented into passages and the passages are compared 
to the portions from the previously read documents. If a 

2S passage ol the new document is identified as being 
closely related to a portion then a selectable link is pro- 
vided in the new document to the old document from 
which the identified portion originated. The user may 
then choose to select the selectable link to the old doc- 

30 umant to read the portion of the old document to en- 
hance understanding of the new document by reminding 
or refreshing the understanding of the reader. In this 
manner a user's understanding of a new document is 
enhanced. 

35 [0009] These and other features and advantages of 
this invention are described in or are apparent from the 
following detailed description of the prelerred embodi- 
ments. 

[0010] The preferred embodiments of this invention 
40 wiR be described in detail, with reference to the following 
figures, wherein: 

Fig. 1 is a block diagram of one embodiment of the 
electronic document reader of this invention; 
45 Rg 2 is a flow chart outlining how the portions are 
formed and stored; 

Fig. 3 Is a flowchart outlining the control routine of 
one embodiment of the method of this invention; 
Fig. 4 shows a display of a new document with se- 
50 tectable links to previously-read documents accord- 
ing to this invention; 

Fig. 5 shows a display of a previously-read docu- 
ment that is relerenced by a selectable link in the 
display of the new document shown in Fig. 4; 
65 FJg. 6 shows a display of another previously-read 
document that is referenced by a second selectable 
link in the display of the new document shown in 
Fig. 4; and 
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Fig. 7 is a block diagram of one embodiment of the 
processing system of this invention. 

[0011] Fig. 1 shows one embodiment ot an electronic 
document reading system 1 0 of this invention. The elec- 
tronic document reading system 10 includes a proces- 
sor 1 2 communicating with a first memory 1 4 that stores 
previously read and annotated documents 1 6 and a sec- 
ond memory IB that stores a new document 20 that is 
currently being read and displayed for a user on a dis- 
play 22. The processor 12 also communicates with a 
third memory 21 lhal stores "surrogates*, I.e., portions, 
23 ofthe previously-read documents 16. The processor 
12 controls the display 22 to display the new document 
20 to the user of the electronic document reading sys- 
tem 10. The processor 12 also communicates with an V 
O interlace 24 that, in turn, communicates with any 
number of conventional \K> devices 26. such as a key- 
board 28. a mouse 30 and a pen 32 The I/O devices 26 
are operated by a user to control the operation of the 
electronic document reading system 10. 
[001 2) As shown in Fig. 1 . the system 1 0 is preferably 
implemented using a programmed general purpose 
computer. However, the system 10 can also be imple- 
mented using a special purpose computer, a pro- 
grammed microprocessor or microcontroller and any 
necessary peripheral integrated circuit elements, an 
ASIC or other integrated circuit a hardwired electronic 
or logic circuit such as a discrete element circuit, a pro- 
grammable logic device such as a PLD, PLA, FPGA or 
PAL. or the like. In general, any device on which a finite 
state machine capable of implementing the flowcharts 
shown in Figs. 2 and 3 can be used to implement the 
system 10. 

[001 3] Additionally, as shown in Rg. 1 , the memories 

14, 18 and 21 are preferably implemented using static 
or dynamic RAM. However, the memories 14, 18 and 21 
can also be implemented using a floppy disk and disk 
drive, a writable optical disk and disk drive, a hard drive, 
flash memory or the like. Additionally, it should be ap- 
preciated that the memories 14, 18 and 21 can be either 
distinct portions of a single memory or physically distinct 
memories. 

[0014] Further, it should be appreciated that the links 

15, 17 and 19 connecting the memories 14, 18 and 21 
to the processor 12 can be a wired or wireless fink to a 
network (not shown). The network can be a local area 
network, a wide area network, an intranet, the Internet, 
or any other distributed processing and storage net- 
work. In this case, the electronic document 20, the pre- 
viously read and annotated documents 16 and the doc- 
ument surrogates 23 are pulled from physically remote 
memories 14, 18 and 21 through the links 15, 17 and 19 
for processing in the system 1 0 according to the method 
outlined below In this case, the electronic document 20. 
the previously read and annotated documents 16 and 
the document surrogates 23 can be stored locally in 
some other memory device of the system 10 (not 



shown). 

[0015] The method of this invention relies on at least 
two subprocesses. The first process maintains a list ol 
document portions 23 and the second process matches 
5 the document portions 23 to passages from the new 
document 20. The results of any matches are displayed 
to the reader as selectable links in the disp)ay of the new 
document 20 in proximity to the matching passages of 
the new document 20. 
J0 [0016] A third optional subprccess clusters the docu- 
ment portions 23 based upon their retatedness to each 
other. Each cluster then approximates an identification 
of a topic. The clustering speeds up the processing be- 
cause the clustering lowers the number of portions to 
rs be compared to the new document. The attributes of the 
clusters are used to compare to the passages of the new 
document and, once a cluster is identified, the portions 
within the identified cluster are analyzed. In this manner, 
the number of portions that are analyzed are greatly re- 
ap duced because only the portions within an identified 
cluster are analyzed rather than all portions. 
[0017] It should be appreciated that, these subproc- 
esses will generally be running concurrently in the back- 
ground. In particular, as the new document 20 is read 
2$ and annotated by the user, the subprccess outlined in 
Fig. 2 generates new portions 23 to be used when read- 
ing a subsequent document. At the same time, when the 
new document 20 is opened, the subprocess outlined in 
Fig. 3 checks the portions 23 generated from previous 
30 documents 1 6 for relevance to the passages of the new 
document 20. 

[0018] - Rg. 2 is a flowchart outlining how the previous- 
ly-read documents 16 are analyzed to identify, store and 
cluster their portions. Preferably, the previously read 
35 documents 16 have been annotated by the user so that 
the surrogates 23 which the user found interesting can 
be identified and extracted into the memory 21 . Starting 
in step S100. the control routine continues to step S11 0, 
where the system segments the documents into por- 
40 tions. The control routine then continues to step SI 20, 
where the portions having annotations are identified. 
Then, in step S130. the control routine stores the iden- 
tified pontons with the references to the underlying an- 
notated pages. Then. t> step S140, the portions 21 are 
45 clustered using similarity metrics to identify major 
themes or topics of interest to the user. Similarity metrics 
are wefl known, and are described In. lor example. In- 
troduction to Modem Information Retrieval*, G. Salton 
et al. McGraw-Hill. 1 983, incorporated herein by refer- 
so once in its entirety. A set of cluster attributes are also 
determined for each cluster. Next, the control routine 
continues to step SI 50, where the control routine stops. 
[0019] Preferably, steps S100-S150 are performed 
continuously in the background as a user reads docu- 
55 moots to create an extensive set of clusters of portions 
of previously read documents. 

[0020] Fig 3 is a flow chart outlining the control rou- 
tine of one embodiment of the method of this invention. 
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Beginning in step S200. the control routine continues to 
step S210. where a new document 20 is segmented into 
passages. Then, in step S220, similarity measures or 
scores are determined between a selected passage of 
the new document 20 and each set ol cluster attributes 
in the memory 21 . Next in step S230. the control routine 
identifies those clusters that have similarity measures 
that exceed a predetermined or user-specified threshold 
or, attematK/ery, the system may identity the duster with 
the highest similarity score. If the system identifies clus- 
ters) having a similarity score exceeding the threshold 
then control continues to step S240. where the system 
determines similarity scores for each portion in the Iden- 
tified cluster(s). The control routing then continues to 
step S250. Step S250 of the system identifies those por- 
tions that have a similarity score exceeding a predeter- 
mined or user-specified threshold. Control then contin- 
ues to step S260. where the control routine displays, for 
each Identified portion, one link to the appropriate old 
document and associates the generated links with the 
corresponding passage of the new document. Control 
then continues to step S270. Alternatively, a lank to the 
document from which the portion having the highest 
similarity score may be generated in the new document. 
Lastly, if no similarity measures exceed the threshold in 
steps S230 or S250. then control jumps directly to step 
S270. 

[0021] In step S270, the control routine determines if 
any unchecked passages in the new document 20 exist 
If so. control returns to step S220. where the next pas- 
sago of the new document 20 is selected. Otherwise, 
control continues to step S280. where the control routine 
determines if one of the selectable links of a currentry 
displayed docwnent. such as the new document 20, has 
been selected. If one of the selectable links has been 
selected, then control continues to step S290. Other- 
wise il no selectable link is selected in step S280, the 
control routine jumps directly to step S300. In step S290, 
the conesponding old document 16 is displayed on the 
display 18 in place of the currently displayed document, 
such as the new document 20 or a previous old docu- 
ment 16. Preferably, the display is centered on the cor- 
resporxfing portion of the old document 16. The control 
routine then continues to step S300 
[0022] In step S300, the control system determines if 
the user has closed the currently displayed document 
20 or 1 6. If not, control returns to step S280. Otherwise, 
control continues to step S310. 
[0023] In step S310, the control routine determines if 
any document 16 or 20 remans open. If so. control re- 
turns to step S280. Otherwise, control continues to step 
S320, where the control routine stops. 
[0024] Figures 4-6 show the various documents and 
links displayed on the display 22 during the operation of 
one embodiment of the system of this invention accord- 
ing to one embodiment of the method of this invention. 
In Fig. 4. the display 22 shows to the user a new docu- 
ment 20, along with selectable links 34' and 34". The 



selectable links 34* and 34" do not interfere or interrupt 
reading because the links 34' and 34* appear in a margin 
of the new document 20. 

[0025] if the user selects one of the selectable links 

5 34* and/or 34", then the display 20 displays the corre- 
sponding document 16'or 16". For instance, if the user 
selects the selectable link 34*. which is labeled as 
"SAAL93*. then, as shown in Fig. 5, the display 22 
shows the corresponding old document 16* that includes 

10 the corresponding identified portion 36*. Altemativety. if 
the user selects the selectable link 34", which is labeled 
as 'SAMC83*. then, as shown h Fig. 6, the olsptay 22 
shows the corresponding old document 16" that in- 
cludes the corresponding identified portion 36*. 

75 [0026] In one embodiment of this invention, when a 
selectable link of the currently displayed document 20 
or 16 is selected, the corresponding old document 16 is 
displayed as the new currently displayed documenL The 
corresponding old document 1 6 is displayed with Its se- 

20 lectable links 34 Displayed in the margin. Thus, in this 
case, the old document 1 6 has become the currentry dis- 
played document and the displayed selectable links 34 
link the displayed old document 16 to other previously 
read and annotated documents 1 6. In this manner, a us- 

2S er of this invention can follow a trail of finks to jump from 
document to document to understand a topic. Accord- 
ingly, in this embodiment if the old document 16 has 
existing selectable links 34, its selectable links 34 can 
be displayed. Furthermore, it can be updated with adoV 

30 tional selectable finks to subsequently read documents. 
[0027] Fig. 7 shows a block diagram ol one preferred 
embodiment of the processor 12 of this invention. As 
shown in Fig. 7, the processor 12 is preferably imple- 
mented using a general purpose computer 100. The 

35 general purpose computer 100 preferably includes a 
controller 110, a segmenting system 120, a selecting 
system 130, a clustering system 140 and an identifying 
system 150. These elements ofthe general purpose 
computer 100 are interconnected by a bus 160. 

40 [0028] The segmenting system 1 20 and the clustering 
system 140, controlled by the controller 110. are used 
to Implement the flowchart shown en Fig. 2. The seg- 
menting system 1 20 and the selecting system 1 30, con- 
trolled by the controller 110, are used to gnplement the 

46 flow chart shown In Fig. 3. It should be appreciated that 
the segmenting system 120. the selecting system 130. 
the clustering system 140 and the identifying system 
150 are preferably implemented as software routines 
running on the controller 110 and stored in a memory of 

so the general purpose computer 100. It should be appre- 
ciated that many other implementations of these ele- 
ments will be apparent to those skilled in the art 
[0029] It should be understood that the term annota- 
tion as used herein is intended to include text digital ink. 

55 audio, video or any other input associated with a docu- 
ment. It should also be understood that the term "docu- 
ment" is intended to include a text document, a video 
document, an audio document and any other tnforma- 
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tron-sloring document and any combination of informa- 
tion-storing documents. The term "document* is also in- 
tended to include passages from documents and is not 
to be limited to whole or entire documents. Further, it 
should be understood that the term text" is intended to 
include text, graphic images, digital Ink, audio, video or 
any other content of a document, including the docu- 
ment's structure. A document's structure is intended to 
include any divisible portion of a document such as a 
word, sentence, paragraph, section, chapter, volume, 
page, etc. 

[0030] The detailed description describes that the 
passages of new. documents are compared with por- 
tions or clusters of portions of previously read docu- 
ments to determine the similarity between them. This 
similarity analysis may be done with any number or type 
of similarity, relatedness or relevance algorithms. 
[0031] While this Invention has been described with 
the specific embodiments outlined above, many alter- 
natives, modifications and variations are apparent to 
those skilled in the art Accordingly, the preferred em- 
bodiments described above are illustrative and not lim- 
iting, various changes may be made without departing 
from the spirit and scope of the invention as defined in 
the following claims. 



Claims 

1 . A method for providing a selectable link in a display 
of a first document to at least one portion of at least 
one second document the method comprising: 

segmenting the first document Into a plurality 
of passages; 

identifying at least one portion of the at least 
one second document having content similar to 
at least one of the plurality of passages; and 
displaying in the first document, for each such 
portion, a selectable link to the second docu- 
ment containing that Identified portion. 

2. The method of claim 1 . further comprising: 

determining If a selectable Snk is selected; and 
displaying at least one portion of the second 
document corresponding to the selected link. 

3. The method of claim 1 or claim 2. further compris- 
ing: 



1S 



20 



the determined passage that is similar to an identi- 
fied portion. 

5. The method of any of claims t to 3, wherein display- 
ing the selectable fink comprises displaying the pas- 
sage as the selectable link to the corresponding at 
least one portion. 

6. An apparatus that provides, in a display of a first 
document, selectable links to at least one portion of 
at least one second document the apparatus com- 
prising: 

a processing system, comprising: 

a segmenting system that segments the first 
document into a plurality of passages, and 
an identifying system that identifies at least one 
of a plurality of portions of the at least one sec- 
ond document is similar in content to at least 
one passage of the first document and 
a display that displays the first document and 
at least one selectable link, each selectable link 
finking a passage ofthe first document to a cor- 
responding one of the at least one second doc- 
ument having at least one portion that is similar 
in content to that Segment 



7. The apparatus of claim 6, wherein the processing 
system further comprises a selection device for se- 
30 lecttng at least one of the at least one selectable 
link, the display displaying the corresponding at 
least one portion of the at least one second docw- 
nent based on the selected selectable link. 

^5 a The apparatus of claim 6 or claim 7, wherein the 
segmenting system segments each at least one 
second document to generate the plurality of por- 

. lions. 

9. The apparatus of any of claims 6 to 8, wherein the 
identifying system identifies each simBar portion 
based on a similarity of each passage to a cluster 
of the at least one portion. 

45 io. The apparatus of any of claims 6 to 9, wherein the 
display Displays the at least one selectable link as 
the olsplay of the passage that corresponds to the 
at least one portion. 



segmenting each at least one second docu- 
ment into a plurality of portions; and 
storing the plurality of portions into a memory. 

4. The method of any of claims 1 to 3, wherein display- 
ing the selectable link comprises displaying each 
link in a margin of the first document proximate to 
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